Unoffical empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

Topic Options
#298669 - 25/05/2007 12:40 PDF Image -> PDF Txt?
Tim
veteran

Registered: 25/04/2000
Posts: 1522
Loc: Arizona
We got some new printers here with the capability to scan at the printer and save the PDF it creates on your computer back at your desk. The only issue with this is it takes what is scanned, converts it to an image and then saves that image in a PDF.

I have a TON of old report (from the mid 70s on) that I would like to scan. My question is, is there any way to take an image in a PDF and convert it into text? I found several PDF -> Word/Power Point/Excel converters, but they just rip the image out and paste it into the other application. I'm looking to convert that picture back into the text, using some kind of OCR I guess.

Anybody have any hints/tips on the easiest/cheapest way to do this?

Thanks!

Top
#298670 - 25/05/2007 13:10 Re: PDF Image -> PDF Txt? [Re: Tim]
Schido
enthusiast

Registered: 29/03/2005
Posts: 364
Loc: Probably lost somewhere in Wal...
SimpleOCR?

http://www.simpleocr.com/

It only opens bmp, jpg, and tiff (and won't handle lzw tiffs it seems), and can only save as doc or txt.
But hey, it's free. (For non-commercial use i noticed now)
_________________________
Empeg Mk1 #00177, 2.00 final, hijack 4.76

Top
#298671 - 25/05/2007 15:18 Re: PDF Image -> PDF Txt? [Re: Tim]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31578
Loc: Seattle, WA
Quote:
The only issue with this is it takes what is scanned, converts it to an image and then saves that image in a PDF.

I'm probably being pedantic here, but just to be clear: All scanners can ever do is make images. It was always an image, so it was never "converted to an image".

The thing you're asking to do is, as you already know, OCR. Which is pre-built into many scanner software packages already. You said this was a new scanner/printer, so I'm surprised it's not already doing that for you. I'd look more carefully at the documentation and the bundled software that came with the printer. It's probably just a question of installing the right disc, clicking on the right icon, or setting the right configuration check box.
_________________________
Tony Fabris

Top
#298672 - 25/05/2007 21:43 Re: PDF Image -> PDF Txt? [Re: tfabris]
tanstaafl.
carpal tunnel

Registered: 08/07/1999
Posts: 5543
Loc: Ajijic, Mexico
Quote:
The thing you're asking to do is, as you already know, OCR.


Admittedly it's been 10 years or more since I played with OCR, but has the software/hardware improved enough to be actually useful?

When I was trying it, they were bragging about 99% accuracy. That's all well and good, but on a full text page that amounts to about 40 errors I'd have to find and correct. It was a tossup whether it was easier to just type the document myself or scan it in and then spend 10 minutes fixing it. Add in the fact that it would try and divide the page up into frames based on the layout of the original document (all I wanted was just the plain text, conveniently paragraphed) and it just wasn't worth the trouble.

I take it things are better now?

tanstaafl.
_________________________
"There Ain't No Such Thing As A Free Lunch"

Top
#298673 - 26/05/2007 01:14 Re: PDF Image -> PDF Txt? [Re: tanstaafl.]
msaeger
carpal tunnel

Registered: 23/09/2000
Posts: 3608
Loc: Minnetonka, MN
I haven't used it too much but I have been impressed the few times I have lately. I think it has improved.
_________________________

Matt

Top
#298674 - 26/05/2007 14:57 Re: PDF Image -> PDF Txt? [Re: tfabris]
Tim
veteran

Registered: 25/04/2000
Posts: 1522
Loc: Arizona
When I asked our IT department about having it as words instead of images, they said the printers don't support that. Not sure if they really know what they are talking about, but the configuration of the printers is locked down and a password is required, so I can't go in and dork around to see if it is possible.

I do know there is nothing available in the software package that came with the printer. I think the fact that the computers and printer are locked down will make it almost impossible to get what I'm looking for.

Top
#298675 - 26/05/2007 15:52 Re: PDF Image -> PDF Txt? [Re: Tim]
lectric
pooh-bah

Registered: 20/01/2002
Posts: 2085
Loc: New Orleans, LA
The printers typically WON'T support what you are asking for. There are plenty of software packages that can OCR images after the fact. Like Omnipage.

Top
#298676 - 26/05/2007 20:04 Re: PDF Image -> PDF Txt? [Re: Tim]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31578
Loc: Seattle, WA
Quote:
they said the printers don't support that.

I don't know of any printers with built-in OCR, I mean that scanner/printers frequently come bundled with third-party OCR software. Since your IT people aren't handing you those disks, I suppose you're stuck trying to find your own OCR solution. One was linked earlier in this thread, I'm sure there are gajillions of them.
_________________________
Tony Fabris

Top
#298677 - 27/05/2007 06:22 Re: PDF Image -> PDF Txt? [Re: Tim]
altman
carpal tunnel

Registered: 19/05/1999
Posts: 3457
Loc: Palo Alto, CA
Acrobat (full version) does OCR on "scanned" PDFs, making a PDF with the text "behind" the scanned image - hence it still looks exactly like the original doc, but you can search. Acrobat also has multi-file search so you can search a whole hierarchy.

I've got one of the fuji scansnap multi-page scanners that:

a) Come with the full version of Acrobat
b) Come with an OCR program that will scan, OCR, and save multipage (and duplex) documents with a single press on the scanner's scan button.
c) Really are very good value. They even do colour.

See http://empegbbs.com/ubbthreads/showflat.php?Cat=0&Board=offtopic&Number=249655

Hugo

Top